Efficient Approximation of Optimal Control for Markov Games
نویسندگان
چکیده
We study the time-bounded reachability problem for continuous-time Markov decision processes (CTMDPs) and games (CTMGs). Existing techniques for this problem use discretisation techniques to break time into discrete intervals, and optimal control is approximated for each interval separately. Current techniques provide an accuracy of O(ε) on each interval, which leads to an infeasibly large number of intervals. We propose a sequence of approximations that achieve accuracies of O(ε), O(ε), and O(ε), that allow us to drastically reduce the number of intervals that are considered. For CTMDPs, the performance of the resulting algorithms is comparable to the heuristic approach given by Buckholz and Schulz [6], while also being theoretically justified. All of our results generalise to CTMGs, where our results yield the first practically implementable algorithms for this problem. We also provide positional strategies for both players that achieve similar error bounds.
منابع مشابه
Efficient Approximation of Optimal Control for Continuous-Time Markov Games
We study the time-bounded reachability problem for continuous-time Markov decision processes (CTMDPs) and games (CTMGs). Existing techniques for this problem use discretisation techniques to break time into discrete intervals of size ε, and optimal control is approximated for each interval separately. Current techniques provide an accuracy of O(ε2) on each interval, which leads to an infeasibly...
متن کاملValue Function Approximation in Zero-Sum Markov Games
This paper investigates value function approximation in the context of zero-sum Markov games, which can be viewed as a generalization of the Markov decision process (MDP) framework to the two-agent case. We generalize error bounds from MDPs to Markov games and describe generalizations of reinforcement learning algorithms to Markov games. We present a generalization of the optimal stopping probl...
متن کاملOptimal Time-Abstract Schedulers for CTMDPs and Markov Games
We study time-bounded reachability in continuous-time Markov decision processes for time-abstract scheduler classes. Such reachability problems play a paramount rôle in dependability analysis and the modelling of manufacturing and queueing systems. Consequently, their analysis has been studied intensively, and techniques for the approximation of optimal control are well understood. From a mathe...
متن کاملUtilizing Generalized Learning Automata for Finding Optimal Policies in MMDPs
Multi agent Markov decision processes (MMDPs), as the generalization of Markov decision processes to the multi agent case, have long been used for modeling multi agent system and are used as a suitable framework for Multi agent Reinforcement Learning. In this paper, a generalized learning automata based algorithm for finding optimal policies in MMDP is proposed. In the proposed algorithm, MMDP ...
متن کاملA Novel Successive Approximation Method for Solving a Class of Optimal Control Problems
This paper presents a successive approximation method (SAM) for solving a large class of optimal control problems. The proposed analytical-approximate method, successively solves the Two-Point Boundary Value Problem (TPBVP), obtained from the Pontryagin's Maximum Principle (PMP). The convergence of this method is proved and a control design algorithm with low computational complexity is present...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1011.0397 شماره
صفحات -
تاریخ انتشار 2010